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Abstract 

Much of the computational effort of the finite element process in- 
volves the solution of a system of linear equations. The coefficient 
matrix of this system, known as the global stiffness matrix, is 
symmetric, positive definite, and generally sparse. An important 
technique for reducing the time required to solve this system is 
substructuring or matrix partitioning. Substructuring is based on 
the idea of dividing a structure into pieces, each of which can then 
be analyzed relatively independently. As a result of this division, 
each point in the finite element discretization is either interior to 
a substructure or on a boundary between substructures. Contri- 
butions to the global stiffness matrix from connections between 
boundary points form the K bb matrix. This paper focuses on the 
triangularization of a general K bb matrix on a parallel machine. 


Support for this research was provided in part by the National Aeronautics and 
Space Administration under contract number NASl-17130 while the author was in 
residence at ICASE, NASA Langley Research Center, Hampton, VA 23665, and in 
part by the National Science Foundation under Grant No. MCS-8305693. 





1. Introduction 

. The finite element method is an important tool for determining approximate 
solutions to systems of differential equations arising in such diverse physical prob- 
lems as structural analysis, fluid flow, and heat transport. In the finite element 
approach, a region of interest (e.g., an airplane wing, a cross section of a pipe, or 
a nuclear reactor core) is discretized into individual elements. The solution then 
gives displacements, vorticities, or temperatures at those points where two or more 
elements are joined together. A complete description of the method has appeared 
numerous times in the literature, cf. [7]. Much of the computational effort of the 
finite element process involves the solution of a system of linear equations. The co- 
efficient matrix of this system, known as the global stiffness matrix, is symmetric, 
positive definite, and generally sparse. To reduce the time required to solve this 
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system, researchers have examined many techniques, among which is substructuring 
or matrix partitioning , cf. [6]. 

Substructuring is based on the idea of dividing a structure into pieces, which 
can then be analyzed relatively independently. As a result of this division, each 
t point in the discretization is either interior to a substructure or on a boundary be- 
tween substructures. The application of substructuring techniques has two obvious 
advantages. First, if the finite element method is applied to a structure, and a 
portion of that structure is then changed in some way, only interior and boundary 
points of the substructures which have changed need to be reexamined. This is an 
advantage, for example, in the case of a researcher considering aircraft structure 
who wishes to fit a new wing to an existing model. Second, with the increased avail- 
ability of parallel processors, substructuring is a natural way to decompose a finite 
element problem into relatively independent subproblems. A separate processor 
can then be applied to the solution of each subproblem. 

The discretization process in the finite element method gives rise to a graph in 
a very natural way. Points where two or more elements join together are the vertices 
or nodes of the graph; two nodes which border on a common element are joined by 
an edge. The structure of the global stiffness matrix is dependent upon the ordering 
of the nodes in this finite element graph and is, in fact, equivalent to the structure 
of the graph’s adjacency matrix. If the nodes corresponding to interior points are 
numbered first, one substructure at a time, followed by the nodes corresponding 
to boundary points, the global stiffness matrix will have the form of the matrix in 
Figure 1, where k\{* represents the contributions from connections between interior 
points of substructure j, and represent the connections between interior 
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points of substructure j and the set of boundary points, and Kbb represents the 
contributions from b oundary-to-boundary connections. 
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Figure 1. Global Stiffness Matrix Structure 

The question of the amount of parallelism inherent in the finite element process 
as a whole has been examined, cf. [l]. The focus of this paper is the triangularization 
of a general Kbb matrix on a parallel machine. This portion of the problem is of 
considerable importance for at least two reasons. First, if any part of the structure 
or region of interest is changed, at least a portion of the Kbb matrix must be re- 
triangularized. Second, as the number of processors in parallel machines increases, 
there will be a tendency to partition structures into more substructures. As the 
number of substructures increases, so does the relative number of boundary points 
and thus the relative size of the Kbb matrix. 

It is assumed that during the solution of the system of linear equations required 
by the finite element process, the global stiffness matrix has been reduced by means 
of Gaussian elimination to the form of the matrix in Figure 2, so that only the Kbb 
matrix is yet to be triangularized. 


2. Parallel Gaussian Elimination 

The ordering of the rows and columns of a given matrix (or equivalently, the 
ordering of the nodes in the corresponding graph) required to minimize parallel 
Gaussian elimination times has been studied by Leuze and Saxton [5]. Following 
the development of their model, assume a completely connected parallel machine 
with an arbitrarily large number of processors. The triangularization of a symmet- 
ric system of linear equations could be programmed on such a machine as follows. 
Each row is stored in a separate processor as a list of nonzero coefficients with a 
stack of corresponding column indices. This stack indicates to a processor what 
communication with other processors is required. At each step of the. elimination 
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process, each processor examines its stack. Suppose machine i (the processor con- 
taining row i) finds the index of machine j at the top of its stack (t < j). Machine t 
then sends its current row information to machine j and pops its stack. When 
machine j finds the index for machine i at the top of its stack, it uses the row 
information from machine i to eliminate row jT’s coefficient in column t and then 
updates its stack by merging in the stack of machine t. When all stacks are empty, 
the coefficient matrix is in upper triangular form. 

Leuze and Saxton then developed a graph theoretic model for parallel Gaussian 
elimination. Their notation in [5] is followed here. Given an nxn symmetric matrix 
A = {a,.,}, define a graph G = ( V,E ) where V — {ri,r 2 ,...,r„} (one vertex per 
row), and E = {(r,-, rj) | a,y ^ 0 and * ^ j}. An ordering of V is a bijection 
/: {1, 2, . . . , n} — ► V. Gf = (V,E, f) denotes an ordered graph. By application of a 
sequence of row and column interchanges, the matrix A can be made to correspond 
to Gf for any ordering /. For each vertex t, the fill of Gf for i is the set of edges 
{(i, k) | i < j < k,(i,j) G E,(t,k) G E , and (j, k) £ E}. The filled graph for 
ordered graph Gf is defined by adding the fill of Gf for each vertex i in order 
i = 1, 2, . . . , n. The fill of Gf corresponds to additional nonzero coefficients of A 
introduced during the elimination process. A parallel time function t: V xV — ► N 
is defined for a filled ordered graph as follows: 
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£( 1 , 1 ) = 0 . 

For t > 0 and j > t, 

t (i = i 

V ’ ' \ max[f (z, j - 1) + 1, max{£(/;, j) + 1 1 A; < t, ( k , j) G £}j, otherwise. 

For 1 < t < n, 

t(i {) = f 0, if for all it < t (it, t) ^ E\ 

’ \ max{£(A;, t) | A: < t and(fc, t) G E }, otherwise. 



Figure 3. Ordered Graphs 

In Figure 3, the arcs of the ordered graphs are labelled with the appropriate 
values from the timing function. Both orderings are optimum with respect to fill, 
but only ordering B is optimum with respect to the timing function. 

Leuze and Saxton present no algorithms for optimum parallel orderings and, 
in fact, conjecture that the problem is NP-complete. They do, however, present 
two interesting results. First, it is demonstrated that the frequently used orderings 
which cluster nonzero elements near the diagonal are non-optimum with respect 
to the parallel time function. This phenomenon is easily understood through an 
examination of a matrix with nonzero elements tightly clustered near the diagonal, 
a tridiagonal matrix. In an N X N tridiagonal matrix, row t (1 < t < N) cannot 
be used in the elimination process until its first nonzero element is eliminated with 
information from row i — 1. The process is forced to proceed sequentially from row 1 
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to row N. Second, a system of linear equations which can be solved without fill 
but for which any minimum parallel time ordering produces fill is presented. This 
system demonstrates that in the parallel model, there is not a perfect correlation 
between the amount of fill and the number of time steps required to solve a system, 
as is the case in the sequential model. 

In this paper, orderings for the class of Kbb matrices will be examined. The 
special characteristics of the Kbb matrix will be examined in Section 3, heuristic or- 
derings applied to Kbb matrices will be discussed in Section 4, and parallel Gaussian 
elimination times resulting from the various orderings will be presented in Section 5. 

3. The Structure of the Kbb Matrix 

When attempting to determine the Kbb matrix structure, the following obser- 
vation about fill in a symmetric matrix is important. If, in the undirected graph 
corresponding to matrix A , there exists a path between node i and node j with 
intermediate nodes numbered less than min{i,j}, fill will occur in matrix elements 
Aij and Aji, cf. (4], Ch. 5. Consequently, if the interior nodes of a substructure form 
a connected set, each boundary node of that substructure will be connected (either 
originally or by means of a fill edge) to every other boundary node of that substruc- 
ture. It thus seems appropriate to divide the set of boundary nodes into subsets 
such that all nodes in a subset are boundary nodes to the same set of substructures. 

For example, consider the division of a cube into eight substructures as indi- 
cated in Figure 4. The set of boundary nodes can be divided into subsets, such as 
the set of nodes on the face between substructures 1 and 2, the set of nodes on the 
edge between substructures 1, 2, 3, and 4, the single node which borders all eight 
substructures, etc. If each subset is designated by the substructures on which it 
borders, Table 1 contains a complete list of boundary node subsets for this example. 

If it is assumed that a boundary node is originally connected only to interior 
nodes and other boundary nodes of the substructures it borders, then any two nodes 
in a given subset are indistinguishable with respect to connectivity, i.e., connected 
to precisely the same nodes. Based on this observation, it seems reasonable to 
assume that nodes within a given subset should be numbered consecutively. The 
question of how subsets should be ordered relative to each other then arises. This 
question will be examined in detail in Section 4. 

The Kbb matrix can then be considered a partitioned matrix with partitions 
separating groups of rows or columns corresponding to boundary node subsets. 
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Figure - 

4. Substructured Cube 

Table 

1 . 

Boundary 

Node 

Subsets 

(1) 

12 

(7) 37 

(13) 

1234 

( 2 ) 

13 

(8) 48 

(W 

1256 

(S) 

15 

(9) 56 

(15) 

1357 

(4) 

24 

(10) 57 

(16) 

2468 

( 5 ) 

26 

(11) 68 

(17) 

3478 

( 0 ) 

34 

(12) 78 

(18) 

5678 


(19) 12345678 


Since each boundary node of a substructure is connected to every other boundary 
node of that substructure, the blocks resulting from such a partitioning will either be 
composed entirely of nonzero elements or be composed entirely of zero elements. If 
the “row subset” and “column subset” of a block share a common substructure, that 
block will be nonzero; otherwise, that block will be zero. Furthermore, whenever 
fill occurs during the triangularization of the Kbb matrix, an entire block will be 
filled. Consequently, the structure of the Ktb matrix can be represented by a matrix 
with one row and one column per boundary node subset, together with information 
about the size of each subset. For example, the subdivided cube of Figure 4 with 
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the boundary node subset ordering of Table 1 can be represented by the matrix of 
Figure 5. 



Figure 5. K, w> Matrix Structure 


In Figure 5, a black square represents a nonzero block; a white square, a zero 
block. Block dimensions can be determined from boundary subset sizes. In this 
example, if each subcube (including boundary nodes) contains n 3 nodes, each “face” 
(bordering on only two subcubes) contains (n - l) 2 nodes, each “edge” (bordering 
on four subcubes) contains n - 1 nodes, and the central “point” (bordering on all 
eight subcubes) consists of a single node. 


4. Ordering Heuristics 

Several heuristics for ordering boundary node subsets were applied to two dif- 
ferent structures, a cube divided into 27 subcubes, each containing n 3 nodes, and an 
“airplane” (Figure 6) constructed from 92 square plates, each containing n 2 nodes. 
The substructure boundaries of the cube were divided into 98 subsets; there were 
220 boundary node subsets for the airplane. 

For each of the two structures, a rather arbitrary “natural” ordering (Order- 
ing 0) of boundary node subsets was initially chosen. Subsets of the cube were di- 
vided into three categories; all “faces” were numbered first, followed by all “edges”, 
and finally, all “points”. All airplane subsets composed a single category. Within a 
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Figure 6. Structure of “Airplane’’ 

category, boundary node subsets bordering on substructure 1 were numbered first, 
followed by unnumbered subsets bordering on substructure 2, etc. 

The ordering heuristics described below are based on two observations of gen- 
eral graphs. (It should be noted, however, that the heuristics are applied not to 
general graphs, but rather to graphs with vertices consisting of sets of boundary 
nodes indistinguishable with respect to connectivity. Block reorderings are, there- 
fore, performed on the corresponding matrices.) First, the total number of parallel 
time steps is smaller for those orderings which number first those nodes adjacent to 
relatively few other nodes. These graph orderings correspond to matrix orderings 
which number first those rows for which the least amount of work must be performed 
during the elimination process. Thus, intuitively, work can begin on intermediate 
rows more quickly. Second, the total number of parallel time steps is larger for those 
orderings which tend to number adjacent nodes consecutively. These are orderings 
which cluster nonzero elements of the matrix near the diagonal. Numbering of this 
type occurs in the Cuthill-McKee [2] and reverse Cuthill-McKee [3] orderings, which 
are included for comparison. 

Ordering 1 (Cuthill-McKee): The first subset in the natural ordering 
is chosen arbitrarily as the starting subset and assigned the number 1. 
Then, for * = 1, . . . , N (where N is the total number of subsets), find all 
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unnumbered subsets adjacent to subset i and number them in increasing 
order of degree. 

Ordering 2 (Reverse Cuthill-McKee): This ordering is obtained by re- 
versing the Cuthill-McKee ordering described above. 

In the ordering descriptions that follow, two classes of variables, DEG and 
ADJ, are associated with each subset. 

DEG variables hold the degree of a subset, the number of subsets to which that 
subset is adjacent. By selecting the subset with minimum DEG value to be ordered 
next, orderings which number subsets from lowest to highest degree are produced. 
DEG1 contains the degree of a subset calculated a priori Its value does not change 
during the ordering process. Some ordering heuristics eliminate a subset when it 
has been numbered and pairwise connect all subsets adjacent to this subset. DEG2 
is dynamically updated to reflect the resulting changes in degree. The DEG2 value 
of a subset is thus dependent upon the ordering selected and may increase as fill 
occurs or decrease as adjacent nodes axe ordered and eliminated. 

ADJ variables hold information about the set of numbered subsets to which 
an unnumbered subset is adjacent. By selecting the subset with minimum ADJ 
value to be ordered next, orderings which spread nonzero matrix elements, rather 
than cluster nonzero elements near the diagonal are produced. ADJl contains the 
cardinality of this set of numbered subsets; ADJ2 contains the maximum subset 
number from this set. ADJS attempts to reflect both the set cardinality and the 
maximum value in the set. Whenever a subset i is numbered, for each unnumbered 
adjacent subset j, ADJS(j) (initially zero) gets the value one plus the maximum of 
ADJS(i) and ADJS(j). ADJ4 is simply a flag which contains the value one if the 
set of adjacent numbered subsets is non-empty and the value zero otherwise. For 
the problems considered, all ADJ4 flags were quickly set. Therefore, whenever it 
was detected that all flags were set, all were reset to zero. 

For each of the orderings, the variables of primary and secondary importance 
are listed in Table 2. The unnumbered subset with minimum value for the variable of 
primary importance is numbered next. Ties are broken by minimizing the variable of 
secondary importance. Any remaining ties are broken by numbering first the subset 
appearing earliest in the “natural” ordering. Orderings 8, 9, and 10 are equivalent 
in a sense to the minimum degree algorithm [8], but correspond to matrix reordering 
by blocks rather than individual elements. 
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Tabic 2. Heuristic Orderings 


Ordering 

3 

4 

5 

6 

7 

8 

9 

10 


Primary 

Variable 

DEG1 

ADJ1 

ADJ2 

ADJ3 

ADJ4 

DEG2 

DEG2 

DEG2 


Secondary 

Variable 

ADJ1 

DEG1 

DEG1 

DEG1 

DEG1 

ADJ1 

ADJ2 


5. Testing 

Each of the heuristic orderings for boundary node subsets was applied both 
to the cube problem and to the airplane problem. For each ordering, parallel 
Gaussian elimination as described in [5] was applied to the resulting Kbb matrix. 
The total number of parallel time steps, the maximum number of processors used 
during any one time step, and the average number of processors used per time 
step were determined for various size problems. Values of n ranged from three to 
nine (each sub cube contained n 3 nodes and each plate of the airplane contained 
n 2 nodes). It did not appear necessary to examine larger problems because of 
the regularity of the data. For every test case, it was possible to determine a 
complexity expression which exactly matched the parallel time step results for all 
values of n greater than four. Processor usage results were not quite as regular over 
the same range of values for n. Some complexity expressions appear to describe 
exactly the asymptotic behavior of the maximum processor values; in other cases 
(marked by the symbol asymptotic behavior was not reached within the test 

range. All maximum processor data was, however, of sufficient regularity to allow 
determination of the leading coefficient of the complexity expression with some 
degree of confidence. In addition, leading coefficients accurate to two decimal places 
were calculated for complexity expressions describing average processor usage. All 
complexity expressions are listed in Tables 3 and 4. 

Actual values for parallel time steps and average number of processors for 
selected orderings are plotted in Figures 7, 8, 9, and 10. Data for several orderings 
are not plotted because the curves would lie extremely close to other curves which 
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Table 3. Complexity of Heuristic Orderings for Cube Problem 




maximum 

average 

Ordering 

Total steps 

processors 

processors 

0 

108n 2 - 216n + 109 

ll.On 2 + 0(n) 

7.16n 2 + 0(n) 

1 

108n 2 - 216n + 109 

18.0r» 2 + O(n) 

9.92n 2 + 0(n) 

2 

108n 2 - 216n + 109 

8.0n 2 + O(n) 

5.52n 2 + O(n) 

3 

88n 2 - 176n + 89 

14.5n 2 + O(n) 

8.55n 2 + 0(n) 

4 

79n 2 -139n + 59 

14.5n 2 + O(n) 

8.58n 2 +0(n) 

5 

71n 2 - 120n + 46 

12.5n 2 + O(n) 

8.28n 2 + 0(n) 

6 

71n 2 — 119n + 47 

14.5n 2 + O(n) 

9.20n 2 + 0(n) 

7 

70n 2 — 117n + 44 

12.5n 2 + O(n) 

8.40 n 2 + O(n) 

8 

63n 2 - 115n + 51 

» 14.0n 2 + O(n) 

7.83n 2 + O(n) 

9 

58n 2 - 102n + 43 

« 14.0n 2 + 0(n) 

8.50n 2 + O(n) 

10 

63n 2 - 114n + 50 

13.0n 2 + O(n) 

7.83 n 2 + O(n) 

Table 4. 

Complexity of Heuristic Orderings for Airplane Problem 




maximum 

average 

Ordering 

Total steps 

processors 

processors 

0 

312n - 499 

23.0n + 0(1) 

13.10n + 0(1) 

1 

312n - 499 

26.0n + 0(1) 

16.30n + 0(1) 

2 

216n - 344 

13.5n + 0(1) 

6.69n + 0(1) 

3 

123n - 133 

32.0n + 0(1) 

13.69n + 0(1) 

4 

129n - 140 

40.0n + 0(1) 

13.96n + 0(1) 

5 

132n - 142 

44.0n + 0(1) 

15.60n + 0(1) 

6 

155n - 185 

«37.0n + O(l) 

18.00n + 0(1) 

7 

113n- 109 

39.0n + 0(1) 

14.04rc + 0(1) 

8 

lOGn - 147 

30.0u + 0(1) 

10.81n + 0(1) 

9 

HOn - 151 

36.5n + 0(1) 

10.86n + 0(1) 

10 

118n - 16i 

33.5n + 0(1) 

10.15n + 0(1) 


are plotted. In all cases, the position of an unplotted curve relative to the positions 
of plotted curves may be determined by examination of the leading coefficient of 
the appropriate complexity expressions. 

Experiments were conducted to compare the heuristic orderings with random 
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orderings. For the cube problem, 200 random orderings were examined; for the 
airplane problem, 100. Average results from this testing are listed in Table 5 and 
plotted in Figures 7, 8, 9, and 10. 


Table 5. Average Complexities from Random Orderings 

Cube Problem Airplane Problem 

Total time steps: 102.72n 2 -f O(n) 256.04n + 0(1) 

Maximum processors: 21.96n 2 -f 0(n) 50.63n + 0(1) 

Average processors: 11.67n 2 + 0(n) 27.58 n + 0(1) 


Figures 11 and 12 are histograms of the leading coefficients of the complexity 
expressions for total number of parallel time steps for the random orderings of the 
cube problem and airplane problem, respectively. 



9 ? 100 10 $ 110 


Figure 11. Leading Coefficient 
of the Parallel Time Complexity Expression 
for Random Orderings of the Cube Problem 
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Figure 12. Leading Coefficient 
of the Parallel Time Complexity Expression 
for Random Orderings of the Airplane Problem 


6. Conclusions 

Methods to order matrices so that nonzero elements are clustered near the di- 
agonal (such as Cuthill-McKee and reverse Cuthill-McKee) have been widely used 
in matrix algorithm implementations for sequential processors. There are three 
primary reasons for the popularity of these techniques: (a) they are easily applied 
to general matrices, (b) storage schemes for the reordered matrices are simple, and 
(c) fill is limited to the region near the diagonal. This work, however, demonstrates 
that these profile-reducing orderings are ill-suited for Kbb matrices to which parallel 
elimination is to be applied. The data of Figure 11 appear to fit a normal distribu- 
tion truncated at the upper end. This truncation suggests that perhaps 108 is the 
maximum possible value for the leading coefficient of the parallel time complexity 
expression for the cube problem. If so, Cuthill-McKee and reverse Cuthill-McKee 
(which cluster nonzero elements near the diagonal) produce orderings which are 
among the worst with respect to total number of time steps required for paral- 
lel Gaussian elimination. For the airplane problem, the Cuthill-McKee ordering is 
worse than any of the 100 random orderings. In the other heuristic orderings, the 
function of the ADJ class of variables is to prevent clustering near the diagonal, 
thus increasing the possibility of parallel execution. 
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The attempt to develop heuristics for Kbb matrix orderings for parallel elim- 
ination appeared to be highly successful. Reordering the boundary node subsets 
significantly reduced the required number of parallel time steps. The best orderings 
for the cube required slightly more than one-half of the time steps required by the 
Cuthill-McKee ordering. For the airplane, time step results from the best orderings 
were approximately one- third of the Cuthill-McKee values. Similar results hold if 
comparison is made with average random orderings. Orderings which produced the 
best results with respect to total number of time steps were those which minimized 
DEGS as the variable of primary importance, i.e., variations of the minimum de- 
gree algorithm. Additional work is needed, however, to establish which tie-breaking 
rules are best. For example, the superior performance of ordering 8 over both or- 
derings 9 and 10 in the airplane problem is unexpected, since ordering 8 simply 
uses ordering 0, rather than ADJ variables, to break ties. 

Maximum processor data and average processor data appear to be closely (but 
not perfectly) correlated. The average number of processors used per time step is 
probably a more important measure of an ordering than the maximum number of 
processors used during any one time step. The reason for this is as follows. If the 
maximum required number of processors is not available, some work which could 
be performed at a particular time step will not be. However, that work which is 
performed will likely produce more work for the next time step. Consequently, the 
work to be performed at any given time could consist of both deferred work and 
newly available work. It appears, therefore, that if slightly more than the average 
processor requirement were available, total parallel time step values would not be 
adversely affected to any great extent. Further studies in which the number of 
processors is limited are needed to substantiate this conjecture. 
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16. Abstract 


Much of the computational effort of the finite element process involves the 
solution of a system of linear equations. The coefficient matrix of this system, 
known as the global stiffness matrix, is symmetric, positive definite, and generally 
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